Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix semantic tokens #13

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

fix semantic tokens #13

wants to merge 7 commits into from

Conversation

zhvng
Copy link
Owner

@zhvng zhvng commented Apr 19, 2023

  • move normalization into the hubert model. brings back part of 77ee0f4. (realized the approach discussed in Hubert args normalization #7 is not a good idea)
  • simplify data processing and remove redundant operations
  • changes to the way semantic tokens are computed in preprocessing: MERT Is trained on a context window of 5 seconds (!) which might explain the deterioration in sample quality over longer sequences noticed by @Saltb0xApps in the discord confirmed by the m-a-p team that MERT generalizes to longer context lengths, and still holds SOTA performance on various tasks. Will leave the option to specify a shorter hubert context length, but won't be used if not specified in config.
  • bin_size option to average adjacent hubert features to reduce the number of semantic tokens

@zhvng zhvng marked this pull request as ready for review April 20, 2023 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant